More Accurate Fuzzy Text Search for Languages Using Abugida Scripts
نویسندگان
چکیده
Text search is a key step in any kind of information access. For doing it effectively, we can use knowledge about the concerned writing systems. Methods based on such knowledge can give significantly better results for searching text, at least for some languages. This can improve information retrieval in particular and information access in general. In this paper, we present a method for fuzzy text search for languages which use Abugida scripts, e.g. Hindi, Bengali, Telugu, Amharic, Thai etc. We use characteristics of a writing system for fuzzy search and are able to take care of spelling variation, which is very common in these languages. Our method shows an improvement in F-measure of up to 30% over scaled edit distance.
منابع مشابه
Orthographic Syllable as basic unit for SMT between Related Languages
We explore the use of the orthographic syllable, a variable-length consonant-vowel sequence, as a basic unit of translation between related languages which use abugida or alphabetic scripts. We show that orthographic syllable level translation significantly outperforms models trained over other basic units (word, morpheme and character) when training over small parallel corpora.
متن کاملDiagnosis of Coronary Artery Disease via a Novel Fuzzy Expert System Optimized by Cuckoo Search
In this paper, we propose a novel fuzzy expert system for detection of Coronary Artery Disease, using cuckoo search algorithm. This system includes three phases: firstly, at the stage of fuzzy system design, a decision tree is used to extract if-then rules which provide the crisp rules required for Coronary Artery Disease detection. Secondly, the fuzzy system is formed by setting the intervals ...
متن کاملUsing Text Surrounding Method to Enhance Retrieval of Online Images by Google Search Engine
Purpose: the current research aimed to compare the effectiveness of various tags and codes for retrieving images from the Google. Design/methodology: selected images with different characteristics in a registered domain were carefully studied. The exception was that special conceptual features have been apportioned for each group of images separately. In this regard, each group image surr...
متن کاملAn Improved K-Nearest Neighbor with Crow Search Algorithm for Feature Selection in Text Documents Classification
The Internet provides easy access to a kind of library resources. However, classification of documents from a large amount of data is still an issue and demands time and energy to find certain documents. Classification of similar documents in specific classes of data can reduce the time for searching the required data, particularly text documents. This is further facilitated by using Artificial...
متن کاملAn Improved K-Nearest Neighbor with Crow Search Algorithm for Feature Selection in Text Documents Classification
The Internet provides easy access to a kind of library resources. However, classification of documents from a large amount of data is still an issue and demands time and energy to find certain documents. Classification of similar documents in specific classes of data can reduce the time for searching the required data, particularly text documents. This is further facilitated by using Artificial...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007